02. EHR Dataset Levels

EHR Dataset Levels

ND320 AIHCND C01 L03 A02 EHR Dataset Levels

EHR Dataset Levels Key Points

With EHR datasets, there are three levels.

  • Line
  • Encounter
  • Longitudinal.

These levels are extremely important in healthcare data, and being able to identify and work with data at the correct level will ensure that you start with the correct data type and dataset to feed to your models.

Overview of the Three Levels

Overview of the Three Levels

Line Level

Line Level: A denormalized or disaggregated representation of all the things that might happen in a medical visit or encounter.

Think of a visit to the doctor for bronchitis.

Your line-level data entries could be:

  • A diagnosis code of bronchitis
  • A medication code for a cough suppressant
  • A procedure code for a test for bronchitis
    and a line could be a diagnosis or medication that was prescribed. Another line could include information on a lab test that the doctor ordered for informing the diagnosis.

Encounter Level

Encounter Level: Also known as the visit level, which is the aggregated information from the previously mentioned line level for one encounter. This information can be collapsed into a single row or arrays.

Using the example above, the encounter level for that visit would include the diagnosis code of bronchitis, medication code for a cough suppressant, and the procedure code for a test for bronchitis in one array or list.

Longitudinal Level

Longitudinal Level: Also known as the patient level view. This level aggregates the patient history and can show how the culmination of visits/encounter lead to some clinical impact.

Continuing with our example above if the patient contracts bronchitis often, over a series of years, we might gain some insights into a possible autoimmune disease or know exactly what to prescribe the patient when they start seeing symptoms.

Now that you have a basic understanding of the different levels we'll explore them a bit more with examples.

Example Overview of Dataset Levels

Example Overview of Dataset Levels

EHR Dataset Levels Continued

As stated above, EHR records are commonly represented at one of the following three levels: line, encounter, and longitudinal levels. Let's review this one more time with the visual above.

Patient A had an Encounter on January 20th of 2019, where they had 3 different codes produced. Patient A also had another encounter on March 20th of 2019 with its own set of codes. All together these encounters and line-level codes add up to the Longitudinal Level of knowledge we have on that particular patient.

The Longitudinal view is an important level for aggregating the patient history and is where you connect information across visits/encounters and rolls up information to determine trends across time.

Why are EHR levels important?

Using the wrong EHR dataset level can lead to major errors with building models because data preparation is done with faulty assumptions and lead to serious error.
For example, a common cause is the duplication of encounter information when you take a line-level dataset and treat it as an encounter level dataset.

Example:

A particular encounter might have 50 lines and that might be treated inadvertently as 50 distinct encounters when it is actually one encounter. This has the effect of upsampling certain common values for that encounter in your dataset, but also creates a great deal of noise since those 50 lines might have only slight differences.

Further, selecting the wrong encounter from the patient record can often occur and there might be a case where you only want the earliest or latest visit or state for a patient or time step for your model. This can cause many issues that might not become apparent until the modeling or deployment phases of your project

How do you know which level you are at?

How do you know which level you are at?

How do you know the dataset level for your data?

This is actually fairly easy if you collect some key metrics from your dataset and there are different ways to do this but I provided a few simple ways to do below.

  1. The total number of rows in the dataset. This is a simple calculation with len()
  2. The number of unique encounters or visits. You can calculate this by finding the field(s) that give the identity of a unique encounter using nunique().

Example

total_rows = len(fake_df)
total_encounters = fake_df['encounter'].nunique()

Line Level

Line Level

From here we do some simple calculations to figure out our dataset level.

If the total number of rows is greater than the number of unique encounters, it is at the line level.

Again using our example from above:

total_rows = len(fake_df)
total_unique_encounters = len(fake_df['encounter'].nunique())

if the output was

  • total_rows = 43464
  • total_unique_encounters = 3259

We could find out using

  • print(total_rows > total_unique_encounters) would evaluate to True

Therefore this dataset would be at the line level.

Encounter Level

Encounter Level

If the total number of rows is equal to the number of unique encounters, it is at the encounter level.

Again using our example from above:

total_rows = len(fake_df)
total_unique_encounters = len(fake_df['encounter'].nunique())

if the output was

  • total_rows = 3464
  • total_unique_encounters = 3464

We could find out using

  • print(total_rows == total_unique_encounters) would evaluate to True

Therefore this dataset would be at the encounter level.

Longitudinal Level

Longitudinal Level

Longitudinal Level

For the longitudinal or patient level, you will see multiple encounters grouped under a patient and you might not even see the encounter id field since this information is collapsed/aggregated under a unique patient id. In this case, the total number of rows should equal the total number of unique patients.

EHR Dataset Levels

QUIZ QUESTION::

Match the correct Dataset Level to example or definition.

ANSWER CHOICES:



Example/Definition

Dataset Level

Gives the whole view of the patient.

027004Z (AKA Heart Surgery)

Has all of the codes for a given visit to a healthcare provider

Is the code for a procedure, medication, or diagnosis

Example Output: [procedure_1,2 ],[diagnosis_1,2,3][medication_12,3,4,5]

Would be grouped by "patient_id"

SOLUTION:

Example/Definition

Dataset Level

Has all of the codes for a given visit to a healthcare provider

Example Output: [procedure_1,2 ],[diagnosis_1,2,3][medication_12,3,4,5]

Gives the whole view of the patient.

Would be grouped by "patient_id"

027004Z (AKA Heart Surgery)

Is the code for a procedure, medication, or diagnosis

027004Z (AKA Heart Surgery)

Is the code for a procedure, medication, or diagnosis

Gives the whole view of the patient.

Would be grouped by "patient_id"

Has all of the codes for a given visit to a healthcare provider

Example Output: [procedure_1,2 ],[diagnosis_1,2,3][medication_12,3,4,5]

Line Level

Which of the following would tell you that your data is at the line level?

SOLUTION: `total_rows > total_unique_encounters` = `True`

Encounter Level Quiz

Which of the following would let you know that your data is likely at the encounter level?

SOLUTION:
  • `total_rows == total_unique_encounters` = `True`
  • total rows = 745,838, unique encounters = 745,838

Reflect

QUESTION:

Why is it so important to make sure your dataset is a the correct level before using it to build a model?

ANSWER:

The incorrect dataset level can lead to major errors with building models because data preparation was done with faulty assumptions. This could lead duplication of encounter information.

Also the selecting the wrong or random encounter from the patient record can have a large negative effect on your model that you won't see until deployment.

These are only a few potential problems, you may come up with some others as well! Thanks for completing this.